Hybrid Neural-global Minimization Method of Logical Rule Extraction
نویسندگان
چکیده
Methodologyof extraction of optimal sets of logical rules using neural networks and global minimization procedures has been developed. Initial rules are extracted using density estimation neural networks with rectangular functions or multi-layered perceptron (MLP) networks trained with constrained backpropagation algorithm, transforming MLPs into simpler networks performing logical functions. A constructive algorithm called C-MLP2LN is proposed, in which rules of increasing specificity are generated consecutively by adding more nodes to the network. Neural rule extraction is followed by optimization of rules using global minimization techniques. Estimation of confidence of various sets of rules is discussed. The hybrid approach to rule extraction has been applied to a number of benchmark and real life problems with very good results. 1 Logical rules introduction. Why should one use logical rules if other methods of classification – machine learning, pattern recognition or neural networks – may be easier to use and give better results? Adaptive systems MW , including the most popular neural network models, are useful classifiers that adjust internal parameters W performing vector mappings from the input to the output space. Although they may achieve high accuracy of classification the knowledge acquired by such systems is represented in a set of numerical parameters and architectures of networks in an incomprehensible way. Logical rules should be preferred over other methods of classification provided that the set of rules is not too complex and classification accuracy is sufficiently high. Surprisingly, in many applications simple rules proved to be more accurate and were able to generalize better than various machine and neural learning algorithms. Many statistical, pattern recognition and machine learning [1] methods of finding logical rules have been designed in the past. Neural networks are also used to extract logical rules and select classification features. Unfortunately systematic comparison of neural and machine learning methods is still missing. Many neural rule extraction methods have recently been reviewed and compared experimentally [2], therefore we will not discuss them here. Neural methods focus on analysis of parameters (weights and biases) of trained networks, trying to achieve high fidelity of performance, i.e. similar results of classification by extracted logical rules and by the original networks. Non-standard form of rules, such as M -of-N rules (M out of N antecedents should be true), fuzzy rules, or decision trees [1] are sometimes useful but in this paper we will consider only standard IF ... THEN propositional rules. In classification problems propositional rules may take several forms. Very general form of such rule is: IF X 2 K(i) THEN Class(X) = Ci, i.e. if X belongs to the cluster K(i) then its class is Ci =Class(K(i)), the same as for all vectors in this cluster. If clusters overlap nonzero probability of classification p(CijX;M ) for several classes is obtained. This approach does not restrict the shapes of clusters used in logical rules, but unless the clusters are visualized in some way (a difficult task in highly dimensional feature spaces) it does not give more understanding of the data than any black box classifier. A popular simplification of the most general form of logical rules is to describe clusters using separable “membership” functions. This leads to fuzzy rules, for example in the form: p(CkjX;M ) = (k)(X) ∑i (i)(X) (1) where (k)(X) = ∏i (k) i (Xi) (2) and (k)(X) is the value of the membership function defined for cluster k. Such context-dependent or clusterdependent membership functions are used in the Feature Space Mapping (FSM) neurofuzzy system [3]. The flexibility of this approach depends on the choice of membership functions. Fuzzy logic classifiers use most frequently a few triangular membership functions per one input feature [4]. These functions do not depend on the region of the input space, providing oval decision borders, similar to Gaussian functions (cf. Fig.1). Thus fuzzy rules give decision borders that are not much more flexible than those of crisp rules. More important than softer decision borders is the ability to deal with oblique distribution of data by rotating some decision borders. This requires new linguistic variables formed by taking linear combination or making non-linear transformations of input features, but the meaning of such rules is sometimes difficult to comprehend (cf. proverbial “mixing apples with oranges”). Logical rules require symbolic inputs (linguistic variables), therefore the input data has to be quantized first, i.e. the features defining the problem should be identified and their values (sets of symbolic or integer values, or continuos intervals) labeled. For example a variable “size” will have the value “small” if the continuos variable xk measuring size falls in some specified range, xk 2 [a; b]. Using one input variable several binary (logical) variables may be created, for example s1 = (size; small) equal to 1 (true) only if variable “size” has the value “small”. The rough set theory [5] is used to derive crisp logic propositional rules. In this theory for two-class problems the lower approximation of the data is defined as a set of vectors or a region of the feature space containing input vectors that belong to a single class with probability = 1, while the upper approximation covers all instances which have a chance to belong to this class (i.e. probability is> 0). In practice the shape of the boundary between the upper and the lower approximations depends on the indiscernibility (or similarity) relation used. Linear approximation to the boundary region leads to trapezoidal membership functions. The simplest crisp form of logical rules is obtained if trapezoidal membership functions are changed into rectangular functions. Rectangles allow to define logical linguistic variables for each feature by intervals or sets of nominal values. A fruitful way of looking at logical rules is to treat them as an approximation to the posterior probability of classification p(CijX;M ), where the model M is composed of the set of rules. Crisp, fuzzy and rough set decision borders are a special case of the FSM neurofuzzy approach [3] based on separable functions used to estimate the classification probability. Although the decision borders of crisp logical rule classification are much simpler than those achievable by neural networks, results are sometimes significantly better. Three possible explanations of this empirical observation are: 1) the inability of soft sigmoidal functions to represent sharp, rectangular 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 0 0.5 1 1.5 2 Figure 1: Shapes of decision borders for general clusters, fuzzy rules (using product of membership function), rough rules (trapezoidal approximation) and logical rules. edges that may be necessary to separate two classes, 2) the problem of finding globally optimal solution of the non-linear optimization problem for neural classifiers – since we use a global optimizationmethod to improve our rules there is a chance of finding a better solution than gradient-based neural classifiers are able to find; 3) the problem of finding an optimal balance between the flexibilityof adaptive models and the danger of overfitting the data. Although Bayesian regularization [6] may help in case of some neural and statistical classification models, logical rules give much better control over the complexity of the data representation and elimination of outliers. We are sure that in all cases, independently of the final classifier used, it is advantageous to extract crisp logical rules. First, in our tests logical rules proved to be highly accurate; second, they are easily understandable by experts in a given domain; third, they may expose the problems with the data itself. This became evident in the case of the “Hepar dataset”, collected by dr. H. Wasyluk and co-workers from the Postgraduate Medical Center in Warsaw. The data contained 570 cases described by 119 values of medical tests and other features and were collected over a period of more than 5 years. These cases were divided into 16 classes, corresponding to different types of liver disease (final diagnosis was confirmed by analysis of liver samples under microscope, a procedure that we try to avoid by providing reliable diagnosis system). We have extracted crisp logical rules from this dataset using our C-MLP2LN approach described below, and found that 6 very simple rules gave 98.5% accuracy. Unfortunately these rules also revealed that the missing attributes in the data were replaced by the averages for a given class, for example, cirrhosis was fully characterized by the rule “feature5=4.39”. Although one may re-
منابع مشابه
A hybrid method for extraction of logical rules from data
A hybrid method for extraction of logical rules from data has been developed. The hybrid method is based on a constrained multi-layer perceptron (C-MLP2LN) neural network for selection of relevant features and extractionof preliminary set of logical rules, followed by a searchbased optimization method using global minimization technique. Constraints added to the cost function change the MLP net...
متن کاملFuzzy and crisp logical rule extraction methods in application to medical data
A comprehensive methodology of extraction of optimal sets of logical rules using neural networks and global minimization procedures has been developed. Initial rules are extracted using density estimation neural networks with rectangular functions or multi-layered perceptron (MLP) networks trained with constrained backpropagation algorithm, transforming MLPs into simpler networks performing log...
متن کاملMethodology of extraction, optimization and application of logical rules
Methodology of extraction, optimization and application of sets of logical rules is described. The three steps are largely independent. Neural networks are used for initial rule extraction, local or global minimization procedures for optimization, and Gaussian uncertainties of measurements are assumed during application of logical rules. A tradeoff between rejection/error level is discussed. Th...
متن کاملA new methodology of extraction, optimization and application of crisp and fuzzy logical rules
A new methodology of extraction, optimization, and application of sets of logical rules is described. Neural networks are used for initial rule extraction, local or global minimization procedures for optimization, and Gaussian uncertainties of measurements are assumed during application of logical rules. Algorithms for extraction of logical rules from data with real-valued features require dete...
متن کاملNeural optimization of linguistic variables and membership functions
Algorithms for extraction of logical rules from data that contains real-valued components require determination of linguistic variables or membership functions. Context-dependent membership functions for crisp and fuzzy linguistic variables are introduced and methods of their determination described. Methodology of extraction, optimization and application of sets of logical rules is described. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JACIII
دوره 3 شماره
صفحات -
تاریخ انتشار 1999